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ABSTRACT 

Objectives We previously developed and reported on a 
prototype clinical decision support system (CDSS) for 
cervical cancer screening. However, the system is 
complex as it is based on multiple guidelines and free- 
text processing. Therefore, the system is susceptible to 
failures. This report describes a formative evaluation of 
the system, which is a necessary step to ensure 
deployment readiness of the system. 
Materials and methods Care providers who are 
potential end-users of the CDSS were invited to provide 
their recommendations for a random set of patients that 
represented diverse decision scenarios. The 
recommendations of the care providers and those 
generated by the CDSS were compared. Mismatched 
recommendations were reviewed by two independent 
experts. 

Results A total of 25 users participated in this study 
and provided recommendations for 175 cases. The CDSS 
had an accuracy of 87% and 12 types of CDSS errors 
were identified, which were mainly due to deficiencies in 
the system's guideline rules. When the deficiencies were 
rectified, the CDSS generated optimal recommendations 
for all failure cases, except one with incomplete 
documentation. 

Discussion and conclusions The crowd-sourcing 
approach for construction of the reference set, coupled 
with the expert review of mismatched recommendations, 
facilitated an effective evaluation and enhancement of 
the system, by identifying decision scenarios that were 
missed by the system's developers. The described 
methodology will be useful for other researchers who 
seek rapidly to evaluate and enhance the deployment 
readiness of complex decision support systems. 



INTRODUCTION 

Although cervical cancer can be largely prevented 
with screening, it still continues to be a major cause 
of female cancer-related deaths. 1 Several national 
organizations have released guidelines for cervical 
cancer screening and surveillance. 2-5 However, the 
guidelines are complex and are based on a multi- 
tude of factors. Consequently, they cannot be easily 
recalled by care providers and many patients do 
not receive the optimal screening. 6-9 

As a potential solution we have previously devel- 
oped and reported a prototype clinical decision 
support system (CDSS), which automatically analyzes 
patient data in the electronic health record (EHR), 
and suggests the guideline-based recommendation to 
care providers. 10 However, the system is susceptible 
to failures due to its complexity as it is based on 



multiple guidelines and free-text processing. Another 
shortcoming of the prototype was that only a single 
guideline expert was involved in its development. 
Therefore, further evaluation was necessary to ensure 
the readiness of the system for deployment in clinical 
practice. This paper reports the methodology used to 
evaluate and improve the CDSS with participation of 
multiple users and experts, before clinical deploy- 
ment. In contrast to the widely published summative 
evaluations that determine the post-deployment 
effectiveness/impact, the aim of this work is to 
perform a formative evaluation before deployment, 
in order to ensure the system's post-deployment 
effectiveness. 

BACKGROUND 

Cervical cancer screening 

Worldwide, cervical cancer was diagnosed in 
approximately 530 000 women and resulted in 
approximately 275 000 deaths in 2008. 11 Despite 
the confirmed effectiveness of routine screening, 
the American Cancer Society estimates 12 170 
cases of cervical cancer and 4220 deaths in the 
USA in 2012. 1 A meta-analysis of 42 multinational 
studies reported that over half of the women diag- 
nosed with cervical cancer had inadequate screen- 
ing or no screening, and that lack of appropriate 
follow-up of abnormal tests contributed to 12% of 
diagnoses. 12 

Cervical cancer screening/surveillance involves an 
evaluation of cervical cells (cytology) through a 
liquid-based specimen or Papanicolaou (Pap) smear. 
Human papilloma virus (HPV) testing may be 
additionally performed to detect the presence 
of high-risk strains of HPV (the cause of cervical 
pre-cancer and cancer). Several national organiza- 
tions including the American Cancer Society, US 
Preventive Services Task Force, American College 
of Obstetricians and Gynecologists and the 
American Society for Colposcopy and Cervical 
Pathology have released guidelines for cervical 
cancer screening and/or management of abnormal 
screening tests. 2-5 However, the guidelines are 
complex and are based on a multitude of factors 
including age, risk factors for cervical cancer and 
previous screening test results. Therefore, recalling 
and following the evidence-based guidelines is chal- 
lenging for care providers, as a result of which 
many patients do not receive optimal screening. 6-9 

Apart from efforts to improve guideline adher- 
ence of the providers, several other interventions 
focused on patients have been investigated in the 
past two decades. 13 The interventions to improve 
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screening rates are adjuvant to strategies for reducing the risk 
factors for HPV infection. 14 They can be broadly categorized as 
educational, 15 reminders, 16 interactive voice response 17 or tele- 
phone call, 18 counseling 19 and economic incentives. 20 
Reminders and educational interventions have been found to be 
most effective. 21 " 23 With the growing use of EHR in the USA, 
the use of decision support systems such as ours to implement 
reminders for providers and patients has a high potential for 
improving the screening and surveillance rates. 24 The following 
subsection provides an overview of the challenges for the utiliza- 
tion of such systems. 

Clinical decision support 

CDSS 25 26 have been developed for a variety of decision pro- 
blems including preventive services, 27 28 therapeutic manage- 
ment, 29 prevention of adverse events, 30 diagnosis, 31 32 risk 
estimation, 33 and chronic disease management. 34 CDSS have 
been found to improve health service delivery across diverse set- 
tings, but there is sparse evidence for their impact on clinical 
outcomes. 35 The potential positive impact of CDSS on the 
quality of care is not always realized, because the systems are 
not always utilized or are not implemented effectively. 26 Some 
of the possible reasons for ineffective implementations are alert 
fatigue, 36 lack of accuracy, 37 lack of integration with workflow, 38 
and prolonged response time. 39 

Formative evaluations to ensure the acceptable levels of the 
above performance parameters may play a crucial role for effect- 
ive implementation. 40 In contrast to the widely published sum- 
mative evaluations that determine the impact/effectiveness of 
the system, the aim of formative evaluations is to address the 
factors that will determine the effectiveness, during the develop- 
ment phase itself. 41 Formative evaluations have been empha- 
sized as critical components of EHR implementation 42 and 
health information technology projects in general. 43 Formative 
evaluations to rectify failure points of a CDSS before deploy- 
ment may enhance the effectiveness of deployment in the clin- 
ical setting. 

Our CDSS is particularly prone to multiple points of 
failure, because it is based on a complex model synthesized 
from multiple guidelines, it requires highly accurate natural lan- 
guage processing (NLP), which can be a challenging task, and 
it utilizes data from a multitude of information sources 10 
(see figures 1 and 2). Moreover, the CDSS is aimed to be com- 
prehensive — to generate screening and surveillance recommen- 
dations for all female primary care patients in the institution, 
which is a major advancement over current systems. 44 ^ 6 



Therefore, a rigorous validation is required for our system to 
ensure user acceptability and clinical impact. This paper reports 
the methodology used to evaluate and improve the CDSS with 
participation of multiple users and experts, before clinical 
deployment. The objective is to ensure that the recommenda- 
tions of the CDSS are of sufficient accuracy to be acceptable 
and useful to the providers. Testing for usability and work-flow 
integration are excluded from the scope of the current study. 

METHODS 

The recommendations of potential end-users for a random 
sample of patients were recorded and compared to the recom- 
mendations generated by the CDSS. Mismatched recommenda- 
tions were resolved by independent experts, and an error 
analysis was performed to improve the CDSS. The study was 
conducted using a web-based application. The detailed method- 
ology is as follows. 

Overview of CDSS architecture 

As shown in figure 1, the CDSS has three modules: data 
module, guideline engine, and NLP module. The latter two 
modules contain respective rulebases, viz a guideline rulebase 
for representing the screening and management guidelines and a 
NLP rulebase for interpreting cervical cytology (Pap) reports. 
When the CDSS is initiated for a particular patient, the guide- 
line engine parses the guideline rules (figure 2) and queries the 
data module for the required patient parameters. The data 
module in turn interfaces with the EHR to retrieve the patient 
information and when the data involves free-text information, 
for example, a cytology report, the data module calls the NLP 
module to extract the relevant variables. Based on its constituent 
rules the guideline engine continues to seek patient parameters, 
until it has sufficient data to compute the recommendation. The 
architecture of the CDSS is elaborated elsewhere. 10 

Expert review of guideline model 

Before initiating this study, the guideline model (rulebase imple- 
mented in the system) was reviewed and approved by several 
experts who did not participate in the development of the 
CDSS prototype. Figure 2 shows the flowchart representation of 
the system's guideline model. 

Construction of test set 

We randomly selected 6033 patients who had visited Mayo 
Clinic Rochester in March 2012 and had consented to make 
their medical records available for research. The CDSS was run 



Figure 1 Architecture of the system. 
CDSS, clinical decision support system; 
EHR, electronic health record. 
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Figure 2 Guideline flowchart for the proof of concept system. It represents the guideline rulebase implemented in the clinical decision support 
system. ASC-US, atypical squamous cells of undertermined significance; G/C, gynecology clinic; HPV, human papilloma virus; PAP, Papanicolaou. 



to compute the screening and surveillance recommendations for 
these patients. Based on the recommendations the patients were 
mapped to the branches in the guideline flowchart for cervical 
cancer screening/management (figure 2). This flowchart was 
developed before the 2012 updates in the national guidelines. 2-5 
Each pathway in the flowchart corresponds to a distinct combin- 
ation of patient variables, and it represents a unique decision 
scenario. As some decision scenarios occur more frequently 
during practice than others, a randomly selected test set can be 
biased towards the frequent decision scenarios. Therefore, to 
ensure that the evaluation was not biased to the frequent scen- 
arios, we performed stratified random sampling, restricting the 
selection to a maximum of 14 cases per decision scenario. The 
total number of cases in the test set was 196. 

User participation 

We invited 89 potential users of the CDSS to participate in this 
study. The recruitment was done by sending mass emails as well 
as by specifically contacting potential users. The participants 
were of diverse background and training. They included staff 
consultants, residents and nurse practitioners from the institu- 
tion's departments of family medicine, internal medicine and 
obstetrics and gynecology. We created a web-based application 
to collect the recommendations of the healthcare providers for 
the test set (figure 3). The web application was deployed on the 
institution's internal network. 

Collection of provider recommendations 

The web system was available from 12 April 2012 to 4 May 
2012. When a participant logged into the system, a 1-min train- 
ing video was presented. Subsequent to the video presentation, 
the web system randomly selected (without repetition) a case 



number from the test set and presented it to the participants. 
The participants assessed the information for the presented case 
by chart review using the EHR system, and recorded the most 
appropriate guideline-based recommendation for the case, by 
selecting the appropriate options in the web system's interface 
(figure 3). In addition to the template recommendation options, 
a free-text box was provided, to allow the participants to input 
recommendations that were not covered in the template 
options. Each participant completed seven different cases. The 
web system also recorded the time taken by the providers to 
input their recommendations. 

Analysis 

The care providers' recommendations were compared with 
those of the CDSS (figures 4 and 5). When there was a mis- 
match in the recommendations, the case was reviewed by one 
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Figure 3 Interface of the web-system used by care providers to 
participate in the study. HPV, human papilloma virus; PAP, 
Papanicolaou. 
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Figure 4 Study design. CDSS, clinical decision support system. 

of two experts who did not participate in the development of 
the prototype, to decide if the CDSS or the provider recommen- 
dation was more accurate/optimal. If the CDSS was found to be 
less optimal, an error analysis was performed to identify the 
fault in the CDSS. The CDSS was then improved to correct the 
identified errors. 



Projection of CDSS impact on clinical practice 

The CDSS was modified and re-evaluated on the test set, in 
order to ensure that the errors identified in the above analysis 



were rectified. Finally, we compared the recommendations of 
the corrected CDSS with those of the providers to identify pro- 
vider errors. These cases were analyzed to identify the decision 
scenarios that were difficult for the providers, in order to 
project the potential of the CDSS to assist with the decisions. 
The average time taken by the providers to make the recommen- 
dations was computed, after excluding outliers. 

RESULTS 

Figure 5 summarizes the results of the CDSS evaluation. Of the 
89 providers who were invited to participate in the study, 28 
agreed to participate, and finally 25 completed the exercise of 
annotating the test cases with their recommendation. A total of 
175 cases was annotated by the participants. The CDSS was 
found to generate an error flag for six cases because it could not 
obtain the pathology reports due to bugs in the interface to the 
EHR system. In the remaining 169 cases, the recommendations 
by the healthcare providers did not match the recommendation 
made by the CDSS for 75 cases. 

The mismatch cases were presented to one of two experts 
(who co-authored this paper). The experts reviewed the recom- 
mendations and decided on the final optimal recommendation 
for the patient. The experts were blinded to the identity of the 
healthcare provider who made the recommendation for the 
individual cases. The CDSS was found to be suboptimal com- 
pared to the provider in 22 cases. Therefore, the accuracy com- 
puted to 147/169 = 87.0% (figure 5 and table 1). 

CDSS error analysis 

Analysis of the 22 CDSS failure cases, led to identification of 12 
errors/failure points in the CDSS (table 2 and figure 6). The 
errors were classified as modeling errors and programming 
errors. Modeling errors are due to deficiencies in the system's 
guideline rulebase/model, for example, missing a decision scen- 
ario, or incorrect logic. Programming errors include errors/bugs 



Figure 5 Summary of test set 
construction and CDSS evaluation 
results, showing number of cases in 
each step of the study. CDSS, clinical 
decision support system; EHR, 
electronic health record. 
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Table 1 Distribution of CDSS errors over different decision scenarios 
Patient variables 



in the developed software, for example, incorrect rounding for 
age cut-off. The CDSS was robust in extracting the patient infor- 
mation from the EHR, except for history of hysterectomy. A 
summary of the errors is as follows (figure 6): 

► The upper age limit for screening recommendation was 
not set, because the approach was to err on the side of 
caution and let the provider overrule the system's recom- 
mendation for stopping screening (errors 1, 2, 4 and 9). 
This has now been rectified by considering the high-risk 
status of the patients. 

► Some of the error cases were due to the system stopping 
screening after the patient's 65th birthday. In these cases 
the age limit was applied after rounding the age (error 2). 
Therefore, to define the age explicitly and avoid rounding, 
the guideline model has been changed to the condition of 
<66 instead of <65 as defined earlier. 

► History of hysterectomy was missed when it was reported in 
the problem list. This was a programming error that was 
resolved (error 6). In one case, hysterectomy was not men- 
tioned in the problem list but occurred in the clinical notes, 
which are not searched by the system. This case was resolved 
after concepts that implied hysterectomy, for example, 
'vaginal wall prolapse after hysterectomy' were included for 
determining history of hysterectomy, as this concept was 
present in the patient's problem list (error 10). 



by 



► The scenario of atypical squamous cells of undertermined 
significance (ASCUS) cytology with HPV not performed 
was not anticipated. This has now been included in the 
corrected model (error 7). A report of inadequate endocer- 
vical transformation zone is now ignored for high-risk 
patients, because it does not impact their management. 
This is because they are already having annual screening 
(error 12). 

After the errors were rectified in the CDSS, it was found to 
generate optimal recommendations for all but one failure case. 
The one case that could not be resolved was due to the inability 
of the CDSS to identify history of hysterectomy in a patient, 
when both the problem list and patient annual questionnaire 
database had no documentation about the patient's hysterec- 
tomy. The experts inferred that the patient had undergone hys- 
terectomy from the clinical notes. The CDSS failed because it 
was not designed to perform NLP on clinical notes to extract 
this information. 

Provider errors analysis 

After the recommendations of corrected CDSS were compared 
to those recorded by the providers, the providers were found to 
provide suboptimal recommendations in 56 of the 169 cases 
(33.1%), which is 34 (20.1%) more cases with suboptimal 
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cases Hysterectomy (years) HPV HPV risk adequate Cytology scenarios CDSS 
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169 13 



The combination of patient variables corresponds to decision scenarios that are grouped for readability and interpretation in the last two columns. 

ASCUS, atypical squamous cells of undertermined significance; CDSS, clinical decision support system; Cyto, cytology; ETZ, endocervical transformation zone; HPV, human 

papillomavirus; NP, not performed; Unsatis, unsatisfactory for evaluation. 
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Table 2 Listing and classification of CDSS errors (corresponds to figure 6) 



Grouped decision 




Error 


Type of 


scenarios 


Error description 


number 


error 


Hysterectomy 


Missed history of hysterectomy in problem list 


6 


Programming 




Missed a case of hysterectomy when not mentioned in problem list, but found in clinical note. This information 


10 


Programming 




is now obtained from patient provided data sources 






Report absent 


If cytology report is absent and age is <21 years, recommendation should be perform Pap-HPV reflex at age 
21 years, instead of saying no recommendation 

When the cytology report is not found, there needs to be an upper age limit for recommending screening 'now' 
for low-risk patients. For high-risk patients screening should be recommended even when age >65 years 


11 
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Modeling 


ASCUS 


Missing decision scenario: when cervical cytology is ASCUS and HPV is not performed recommendation should 
be 'Cytology at 6 and 1 2 months' 
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Cyto and HPV neg ETZ 


When the cytology and HPV are negative but ETZ is inadequate examine age instead of age at recent report to 


5 


Programming 
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recommend next screening 

When cytology and HPV are negative and in adequate ETZ, recommend Pap-HPV reflex at 6 or 12 months if last 
test was co-test, or recommend Pap-HPV co-test at 6 or 12 months if last test was reflex 
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Modeling 




When the cytology and HPV are negative but ETZ is inadequate, there is a need for upper age limit 
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Modeling 
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Modeling 


Normal cytology 
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Modeling 




For recommending screening for high risk patients, the upper age limit cut-off needs to be removed, as they 
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Modeling 




would continue annual screen even if >65 years old. For low-risk patients with normal cytology, the upper age 








limit cut-off needs to be corrected 






Unsatisfactory for 


For low-risk patients, there needs to be an upper age limit for recommending repeat test after 3 months 


1 


Modeling 



evaluation 



ASCUS, atypical squamous cells of undertermined significance; CDSS, clinical decision support system; CIN1, cervical intraepithelial neoplasia 1; HPV, human papilloma virus; PaP, 
Papanicolaou; ETZ, endocervical transformation zone. 



recommendations compared to the CDSS. Several of these 
patients had abnormal screening reports such as abnormal 
(other than ASCUS) cytology, ASCUS cytology, positive HPV or 
inadequate endocervical transformation zone. Some of the 



provider errors were due to incorrect determination of the risk 
status of the patient, due to boundary conditions such as age 
cut-offs. The mean time taken by the providers to make the rec- 
ommendation was 1 min 39 s. 
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1 ( 
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(otherthan 
ASCUS) 



positive 



negative 

not-performed 



PAP (only cytology) at 6 and 12 months 




Figure 6 Modified guideline flowchart. The number in red circles corresponds to the errors described in the text and table 2. The yellow rectangles 
circumscribe the elements that were appended or modified to make the corrections. ASC-US, atypical squamous cells of undertermined significance; 
ETZ, endocervical transformation zone; G/C, gynecology clinic; HPV, human papilloma virus; PAP, Papanicolaou. 
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DISCUSSION 

The study facilitated a comprehensive evaluation of the CDSS 
on a large and diverse set of patients that covered nearly all pos- 
sible decision scenarios. The CDSS was evaluated to have a fair 
accuracy, and by performing the error analysis of failure cases 
the CDSS was considerably improved. 

The formative evaluation based on the reference set anno- 
tated by the care providers led to the identification of several 
failure points in the system. Several logical steps necessary to 
apply the national guidelines were missed when the guideline 
model was inspected by the experts before the study. The use of 
representative cases and their decision annotations by the care 
providers in this study helped draw attention to the particular 
scenarios in which the logical steps were missed. The task of 
modeling the free-text guidelines as rules is challenging due to 
ambiguity of the natural language used in the guidelines, and 
due to the difficulty in envisioning decision scenarios that can 
occur in clinical practice. 47 48 Our results indicate that guideline 
models based on abstraction from textual guidelines need to be 
tested with consistency checks on real-life cases. This finding is 
consistent with earlier research that demonstrates the critical 
importance of carefully analyzing the reasons for practising clin- 
ician disagreements with decision support, in order to improve 
CDSS design and effectiveness. 49 

The analysis identified situations/f actors when the CDSS was 
prone to make errors, for example, hysterectomy cases. It also 
identified guideline areas in which the care providers need 
decision support. The providers were found to have difficulties 
in decision making for cases with abnormal findings, as 
reported by other studies. 12 50 Lack of follow-up referral after 
a positive screening test has also previously been documented 
in the context of colorectal cancer screening. 51 52 As the 
patients with abnormal screening reports are especially at risk 
of developing cancer, the screening/surveillance recommenda- 
tions made by the providers can have far-reaching conse- 
quences for the patients. The CDSS was notably found to 
perform consistently well for such patients, and its deployment 
can be expected to improve the quality of the screening ser- 
vices considerably. Moreover, the CDSS can lead to provider 
time savings of 1 min 39 s per patient consultation, as deter- 
mined in this study. 

An alternative approach to evaluate the CDSS before deploy- 
ment in clinical practice is to conduct a pilot study with a subset 
of potential end-users, who will verify the system's recommen- 
dation and provide feedback for improving the system. There 
are several disadvantages to this approach: the evaluation will 
be biased towards frequently occurring decision scenarios unless 
a special effort is made to identify the less frequent but high 
impact scenarios in the evaluation; and there will be a risk of 
missing validation for rare but important decision scenarios. 
Our approach of identifying distinct decision scenarios for the 
evaluation by using the prototype CDSS helped avoid bias 
towards the frequent decision scenarios, and allowed for an effi- 
cient utilization of the efforts of the participating providers and 
experts. 

Similarly, our approach to blind the users to the CDSS recom- 
mendation has an advantage over seeking user feedback after 
deployment, because in the post-deployment setting, the user's 
judgment can be influenced by knowledge of the output of the 
CDSS. 49 Consequently, in the latter approach some of the 
failure points may be missed. Moreover, it may not be possible 
to project the clinical impact of the system, due to the modifica- 
tion of user behavior. With the current approach the decision 



scenarios that were difficult for the users were identified, and 
the usefulness of the system after deployment could be pro- 
jected. Another advantage is that the end-users are not directly 
exposed to the CDSS before the formative evaluation; therefore, 
there is no loss of user confidence. 53 

A difficulty in performing CDSS evaluation is that it is often 
not feasible to involve a large number of users in system evalu- 
ation. The crowd-sourcing approach used in this study allowed 
a large number of users to participate, which in turn facilitated 
the construction of a large reference dataset of real-life decision 
scenarios. Consequently, the CDSS could be evaluated compre- 
hensively for a wide variety of scenarios. 

Literature on CDSS mainly consists of summative evaluations 
measuring impact on service and clinical outcomes. 54 55 Studies 
on performance aspects of the CDSS are rare, which suggests a 
lack of effort to ensure effective implementation. Our results 
demonstrate that such studies may be increasingly needed as 
complex CDSS that have an increased risk of failures are devel- 
oped. Furthermore, research into developing efficient and prac- 
tically feasible methods for pre-deployment evaluation of CDSS 
is called for. We believe that the approach described will be 
useful for developing complex systems that support wider and 
more complex domains of care. 28 56 The formative evaluation 
to ensure that the decision model itself is accurate will facilitate 
subsequent enquiries after deployment for quantifying guideline 
adherence of the providers, and for measuring clinical impact. 

Crowd-sourcing can be useful for the development and valid- 
ation of decision support applications. McCoy et al 57 have 
earlier used crowd-sourcing for building a knowledge base of 
problem-medication pairs. In their institution it was mandatory 
for clinicians to link prescriptions to patient problems, and 
McCoy et al 57 leveraged the resulting database as a resource to 
construct their knowledge base. On the other hand, our 
approach was to seek volunteer effort from the care providers 
for creating a gold standard for validating the CDSS. 

Our analysis identified that the incompleteness of problem list 
and patient-provided information for hysterectomy is a chal- 
lenge to accurate working of the CDSS for the subset of patients 
with hysterectomy. We plan to extend the NLP module of the 
CDSS to identify history of hysterectomy from clinical notes, if 
more such patients are encountered in the future. 56 Overall, the 
CDSS has a high level of accuracy, and has the potential to 
improve providers' recommendations especially in the high 
utility areas of the guidelines, and can thereby significantly 
advance the quality of screening. However, the corrected CDSS 
was not tested with new cases, which would be of benefit to 
determine whether further discrepancies in recommendations 
need to be addressed. We expect that the majority of the errors 
have been identified in the current analysis, and we plan to 
perform additional evaluations with a different set of cases to 
ensure system accuracy before deployment. 58 

We restricted the scope of the evaluation to accuracy and did 
not test the usability and integration with workflow, which are 
also major factors that determine utilization and clinical impact 
of the CDSS. These will be tested separately with pilot studies. 
Nonetheless, we expect that elimination (or at least minimiza- 
tion) of the issue of delivering the correct recommendations will 
facilitate the subsequent pilots. 

Limitations 

The use of an unfamiliar interface may have induced partici- 
pants' mistakes, although we had provided a training video and 
designed a simple interface to record the participants' 
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recommendations. On the other hand, the participants were 
focused on the task of making screening decisions, and their 
performance can be expected to be better than target users who 
will have other tasks during the patient visit. As a result of these 
factors, further research is necessary to determine the usefulness 
of our approach to quantify provider errors. Nevertheless, our 
results indicate that the methodology is useful to identify quali- 
tatively the areas for decision making that are difficult for the 
providers. 

Updated cervical cancer screening guidelines were published 
at the end of our evaluation period. 59 60 It is possible that some 
of the participating providers were aware of the forthcoming 
change in the guideline and provided recommendations in 
accordance with the anticipated guideline. 

We limited the expert review to cases in which there was a 
mismatch in recommendations of the CDSS and the providers, 
because the proportion of errors is expected to be high in this 
subset of cases. Consequently, there is a chance of missing erro- 
neous decisions, when the recommendations of both the pro- 
vider and CDSS are not optimal. However, such cases are 
expected to be small in number and are likely to have a repre- 
sentation in the mismatch group. The strategy of focusing on 
the mismatch group facilitates a judicious use of the expert 
reviewers' efforts. 

Double blinding of reviewers was not done. It may be useful 
to blind the expert reviewers as to whether the source of the 
recommendations was the care provider or CDSS. 

CONCLUSION 

Our case study demonstrates that the approach to crowd-source 
the construction of the reference recommendations set, coupled 
with the expert review of mismatched decisions, can facilitate an 
effective evaluation of the accuracy of a CDSS. It is especially 
useful to identify decision scenarios that may be missed by the 
system's developers. The methodology will be useful for research- 
ers who seek rapidly to evaluate and enhance the deployment 
readiness of next generation decision support systems that are 
based on complex guidelines. 
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